35 research outputs found

    Character-Word LSTM Language Models

    Full text link
    We present a Character-Word Long Short-Term Memory Language Model which both reduces the perplexity with respect to a baseline word-level language model and reduces the number of parameters of the model. Character information can reveal structural (dis)similarities between words and can even be used when a word is out-of-vocabulary, thus improving the modeling of infrequent and unknown words. By concatenating word and character embeddings, we achieve up to 2.77% relative improvement on English compared to a baseline model with a similar amount of parameters and 4.57% on Dutch. Moreover, we also outperform baseline word-level models with a larger number of parameters

    A Comparison of Different Punctuation Prediction Approaches in a Translation Context

    Get PDF
    We test a series of techniques to predict punctuation and its effect on machine translation (MT) quality. Several techniques for punctuation prediction are compared: language modeling techniques, such as n-grams and long short-term memories (LSTM), sequence labeling LSTMs (unidirectional and bidirectional), and monolingual phrase-based, hierarchical and neural MT. For actual translation, phrase-based, hierarchical and neural MT are investigated. We observe that for punctuation prediction, phrase-based statistical MT and neural MT reach similar results, and are best used as a preprocessing step which is followed by neural MT to perform the actual translation. Implicit punctuation insertion by a dedicated neural MT system, trained on unpunctuated source and punctuated target, yields similar results.This research was done in the context of the SCATE project, funded by the Flemish Agency for Innovation and Entrepreneurship (IWT project 13007)

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Definite il y a-clefts in spoken French

    No full text
    © Copyright Cambridge University Press 2016. This article discusses il y a-clefts in spoken French. In the linguistic literature, only one function of il y a-clefts is widely acknowledged, namely presenting a new event in the discourse. By studying corpus examples in their wider context, we found however that many occurrences do not easily fit in the properties described in the literature. We make a distinction between presentational il y a-clefts, which can be event-presenting or entity-presenting, and specificational enumerative il y a-clefts, which give an example of a class that was implicitly or explicitly evoked in the context.Verwimp L., Lahousse K., ''Definite il y a-clefts in spoken French'', Journal of French language studies, pp. 1-28, 2016, Cambridge University Press.status: publishe
    corecore